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Abstract — This paper presents an approach for side channel 
cryptanalysis with iterative approximate Bayesian inference, 
based on sequential decoding methods. Reliability information 
about subkey hypotheses is generated in the form of likelihoods, 
and sets of subkey hypothesis likelihoods are optimally combined 
into key bit log likelihood ratios. The redundancy of expanded 
keys in multi-round cryptographic schemes is exploited to correct 
round key estimation errors. This is achieved by sequential decod- 
ing, where subkey candidates are sorted by a probabilistic path 
metric and iteratively extended. The M-algorithm is presented 
as a concrete implementation example with deterministic run- 
time behaviour. The resulting algorithm contains previous hard 
decision differential analysis as special case for single-round 
analysis and M=l, and is strictly more accurate otherwise. The 
trade-off between estimation accuracy and complexity is scalable 
by parameter choice. The proposed algorithm is simulatively 
shown in an example scenario to reduce the number of required 
side channel traces compared to standard differential analysis 
by a factor of two when run with reasonable complexity, for the 
whole investigated signal-to-noise ratio range. 

Index Terms — side channel analysis, sequential decoding, dif- 
ferential analysis, Bayesian inference, M-algorithm, cryptanalysis 



I. Introduction 

Side channel analysis is used to infer secret keys of 
cryptographic systems from measurements of some physical 
processing leakage, especially power consumption and elec- 
tromagnetic radiation. Analysis methods are categorized into 
'simple' and differential methods. 'Simple' analysis directly 
classifies measured side channel traces, e.g. simple power 
analysis (SPA) and simple electromagnetic analysis (SEMA). 
Differential methods like differential power analysis (DPA) 
and differential electromagnetic analysis (DEMA) use a-priori 
knowledge about a subkey dependency of some intermediate 
processing result |fl~), J2), El- This a-priori knowledge is 
commonly described as differential analysis selection function, 
which describes the dependence of an information leaking 
variable on a few key and data bits. Data bits in this re- 
spect may be cyphertext bits when analyzing decryption or 
alternatively plain text bits when analyzing encryption. The 
standard method for differential analysis is to partition the 
traces according to a subkey hypothesis and the selection 
function, and to perform a difference-of-means test with a 
side channel leakage model. The most common model for 
power consumption and electromagnetic radiation (which is 
proportional to the derivative of power consumption) is the 



Hamming distance model, i.e. to assume differential side 
channel leakage proportional to the number of switching bits 
on a bus |4j. The number of subkey bits in the selection 
function must be small enough to allow for enumeration: all 
possible subkey hypotheses are tested in turn, and the most 
likely one is chosen as estimation result. A detailed description 
of the standard differential analysis algorithm and comparison 
with several variations is given e.g. in 0. 

The main motivation of this paper is the following: many 
block cyphers like DES |6| and AES [7| are based on a multi- 
round scheme, where the key is expanded into the round keys 
according to a standardized key schedule. The key expansion 
can in fact be seen as a block code, with a code rate of 
1/R for a scheme with R rounds. This redundancy can in 
principle be exploited for soft-decision error correction in the 
side channel based key estimation. The optimality criterion is 
maximum likelihood sequence estimation (MLSE), i.e. to find 
or approximate the most likely valid expanded key. The idea 
faces two obstacles, for which this paper proposes solutions. 
First, the set of possible round keys is by design too large 
for an enumeration. The presented approach generates key bit 
log-likelihood-ratios (LLRs) from differential analysis, which 
allows to sort the key space according to reliability and to 
perform an informed search by only visiting a comparatively 
small number of candidate key sequences. Second, to follow 
the decryption (or encyption analogical) over several rounds, 
either the available cyphertext has to be conditionally de- 
crypted after each round based on a round key sequence 
hypothesis, or equivalently the selection function could be 
adapted. This paper uses sequential decoding ||8), E), ifTOl to 
sort round key sequence hypotheses with a path metric, and 
to iteratively expand the most likely candidates. 

Sequential decoding methods are common knowledge in 
communication theory and especially applied to multiple input 
multiple output (MIMO) demapping and decoding of concate- 
nated channel codes (e.g. ifTTl and the references therein). 
The underlying problem is also to probabilistically and iter- 
atively combine soft information from different sources. The 
contribution of this paper is therefore to apply these well- 
known methods from the area of communications to the area 
of side channel cryptanalysis. The result is a characterization 
of the three-dimensional trade-off between the number of 
required side channel traces, the measurement signal-to-noise 
ratio (SNR) and the applied computational effort. 
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Fig. 1, Overview of variable dependencies. 



The remainder of the paper is structured as follows. The 
next section describes the system model and notation, Sec. [Til] 
shortly reviews standard single-bit differential analysis. The 
proposed algorithm is presented in detail in Sec. [IV] including 
key bit LLR generation, sequential decoding and multi-round 
soft information combining. The complexity of this algorithm 
in dependence on parameter choices is evaluated in Sec. [V] 
The accuracy of the estimation is simulatively evaluated and 
compared to the standard hard-decision differential analysis in 
Sec.[Vl]for different SNR values. The last section discusses the 
results and possible modifications of the presented approach. 

II. System Model and Notation 

Vectors are denoted in bold font. The expectation operator 
is denoted E[-], variance as V[-] and probability as P(-). A 
multi-round cryptographic scheme with R rounds is assumed. 
The expanded key is written as 



(ki . . . k*) , 



(1) 



with the round keys 
of N B bits. 



1 . . . R. Each round key consists 



A. Sensitive variables and selection function for differential 
analysis 

Differential analysis requires a-priori knowledge about at 
least one sensitive intermediate variable, which leaks infor- 
mation through the side channel A typical example are 
byte-wise S-box lookups for substitution in a substitution- 
permutation network like AES [4|. The selection function 
depends on the specific cryptographic algorithm, and also on 
implementation aspects like e.g. the usage of masking [12|. 
The sensitive variable is denoted v = (v% . . -Vn v ), and the 
selection function G: 



v = G(d,k s ) 



(2) 



v depends on the subkey ks which comprises several bits of 
the round key kj, and on some data bits d. The length (number 
of bits) of the subkey is left variable at this point. 

B. Power consumption model and feature extraction 

The power consumption of digital devices depends on 
their transistor switching activity. The two most common 
power consumption models for side channel analysis are the 
Hamming weight model and the Hamming distance model |4|. 
Their applicability depends on the implementation technology 
of the device under test. 



1) Hamming weight model: This model assumes a dif- 
ferential current proportional to the number of positive bits 
on a bus (i.e. the Hamming weight). It is applicable for 
devices implemented in precharge logic (4) . The differential 
side channel leakage l(t) of the sensitive variable under this 
model is: 



l(t) 



E 



Vi(t) 



(3) 



2) Hamming distance model: The Hamming distance 
model assumes that the switching of bits on a bus consumes an 
amount of current proportional to the number of switching bits 
(i.e. the Hamming distance of consecutive words on the bus). 
This model is applicable for devices implemented in CMOS 
logic. The side channel leakage of the sensitive variable (e.g. 
at transfer to/from a register) thus depends on the previous 
word on the bus. 

a) Device using software implementation: : If the crypto- 
graphic algorithm is implemented in software, the assumption 
can be made that the sensitive variable is handled at a fixed 
point in the program. Then a constant but unknown previous 
bus state r can be assumed iflD . The differential leakage 
becomes: 



lit) 



N v 

E 

i=l 



(4) 



with denoting the XOR operator. iTHl proposes to assume 
all possible reference states r and to use the most likely one, 
computed as highest correlation factor. 

b) Device using hardware implementation:: If the cryp- 
tographic algorithm is implemented in hardware, it might 
be assumed that consecutive values of the sensitive variable 
are stored consecutively in the same register. The differential 
leakage then becomes: 



Ny 



l(t) = J2(vi(t)®Vi(t-l) 



(5) 



c) Comparison:: a generalized power model comprising 
both the Hamming weight and the Hamming distance model 
with arbitrary parameters is obtained by conceptually assum- 
ing a convolutional code C over the sensitive variable on the 
word level (with or 1 tap delay in the presented examples): 



Ki)=£C(v(t),v(i-l)) 



(6) 



i=l 



With respect to the key bits kg, the resulting differential 
leakage 

N v 

l(t) = £c(G(d(t),ks(*)),G(d(t- l),ks(t- 1))) (7) 

i=i 

can be seen as a short block code concatenated with a short 
convolutional code (for decoding of concatenated codes see 
e.g. 133), 03], iflOl ). Unknown code parameters need to be 
known a-priori, guessed or estimated. 
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Fig. 2. Differential analysis: conditional densities of received amplitude and 
mean estimation functions for 500 traces at SNR=6dB. 



3) Feature extraction: Side channel measurements usually 
use oversampling with respect to the clock of the device 
under test, especially when measuring an electromagnetic 
side channel. Measurements y use the clock t, as compared 
to the device using clock t. Since the device's switching 
power is of interest, and because power is proportional to the 
squared amplitude of voltage or current (or the related fields 
respectively), the received energy during one target clock cycle 
may be extracted as feature: 



/(*) = \y(t)f 



N S 

£ 

r=l 



(8) 



where Ns is chosen to capture the clock's rising edge half 
period where the information transfer happens. 

4) Assumption for simulations in this paper: For the sake 
of clarity of the presentation, with the focus lying on key bit 
LLR generation and multi-round soft information combining, 
the Hamming weight model, Eq. ([3]), is assumed. As discussed, 
a Hamming distance model could be reduced to the Hamming 
weight model by assuming a serially concatenated short con- 
volutional code over the sensitive variable. Additive noise n(t) 
on the measurements caused by the measurement equipment 
itself (like thermal noise of the receive amplifier) as well as 
environmental noise and interference from other switching ac- 
tivity in the circuit is assumed independent from the sensitive 
variable. Simulations in this paper assume therefore: 



with the SNR: 



y(t) =l(t) + n(t) , 
V[Z] _ of 
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V[r. 



(9) 



(10) 



A diagram of the variable dependencies is shown in Fig. Q] 

III. Review of Standard Single-Bit Differential 
Analysis 

The power dissipation of indidivual switching bits is too 
small to be recognizable with 'simple' analysis. Fig. [2] shows 



the (simulated) distribution of received amplitudes, condi- 
tioned on one bit Vi being 1 or respectively, i.e. P(j/|uj = 1) 
and P(y\vi = 0). The figure shows almost identical conditional 
distributions at 6dB SNR. 

As 'standard' single-bit differential analysis it is referred to 
ifD , 0, 0. With the known selection function and a subkey 
hypothesis ks, the trace data can be partitioned according to 
the value of one i>j. For this subkey hypothesis, the difference 
of means over the disjoint trace set partitions is computed: 



P D =E T [f\ Vl = 1]-Er[f\vi = -1] 



(11) 



For an incorrect subkey hypothesis, the expected result is zero. 
For a correct subkey hypothesis, a nonzero value correspond- 
ing to the bit's switching energy is expected. Independent noise 
is averaged away by the expectancy over the trace partitions. 
The limit for an increasing number of traces Nt is: 



lim Pd = 



if kg correct, 
else 



(12) 



with e being the received switching energy. The subkey is 
estimated by testing all hypotheses and choosing the one with 
highest difference of means, corresponding to the maximum 
likelihood (ML) subkey. 

IV. Proposed Algorithm 

The proposed algorithm is an approximative maximum like- 
lihood sequence estimation (MLSE), where the accuracy can 
be scaled by the invested computational effort. A difference 
to the standard differential analysis algorithm is that not only 
the most likely subkey hypothesis is used, which would mean 
a premature quantization of an intermediate result, but that all 
tested hypotheses are used and optimally combined using a 
probabilistic formulation. This probabilistic formulation also 
allows for combination of different selection function. For 
stochastic inference, key bits are modelled as independent 
random variables. Their probability distribution functions are 
computed from partial problems, and iteratively updated ac- 
cording to Bayes' theorem whenever an additional problem 
constraint or information source is joined (Bayesian network, 
Belief propagation 1 16 1). Quantization of key bit probabilities 
into binary values is done only at the end of the algorithm, to 
avoid error propagation. Since the key search space even in one 
round is by design for too large to consider all possibilities, an 
informed graph search is used which avoids looking at most 
possible key values. This is enabled by iterative sorting of 
likelihoods of parts the joint solution and pruning the search 
tree. 

A. Problem structure: factorize expanded key probabiltiy den- 
sity function 

The joint probability density function of all random vari- 
ables is factorized using conditional probabilities and Bayes' 
theorem, to break down the estimation problem into smaller 
components, which can be treated with limited computational 
effort. This complexity reduction compared to joint estimation 
exploits conditional independencies between variables |16|. 




For denotational as well as for computational convenience, 
bit values are denoted as { — 1,+1} in the following, instead 
of the normal {0, 1}. Factorizing the expanded key estimation 
problem into the round key estimation problems is: 



P(ki...i) =P(k i |k i _i...ki).P(k i _i...i) 



(13) 



posterior probability likelihood prior probability 

where kj...x is the sequence of round keys k; downto ki. 
Bayesian updating can then be applied iteratively over the 
rounds. The formulation with conditional probabilities turns 
the problem into a tree code model, where a graph search can 
be run for the 'correct', i.e. the most likely leaf. The sequential 
decoding approach offers several implementation algorithms, 
among them the stack algorithm [8|, the M-algorithm ifTTl 
and the T-algorithm ifTHl . The stack algorithm corresponds to 
a depth first search, while the M-algorithm and T-algorithm 
correspond to a breadth first search, where the number of 
extended 'most likely' nodes per level is limited (constant for 
the M-algorithm and variable for the T-algorithm). 

B. Generate soft information: key bit log-likelihood ratios 

Log-likelihood ratios (LLRs) are a convenient way to ex- 
press a key bit's distribution function in one number. Another 
advantage of computations in the log domain is that multi- 
plications necessary to compute the probability of a sequence 
of independent bits are effectively turned into additions. The 
LLR of a bit b is: 



L{b) = log 



P(6 = 1) 



and the inverse: 



P(6 = ±1) = 



-1) 



,±i(5)/2 



(14) 



(15) 



e +L(6)/2 + e -L(b)/2 

The sign of the LLR indicates the likely bit value, and the 
LLR magnitude its reliability. 

1) Estimate subkey likelihoods: To generate LLRs from 
differential analysis, the reproduction property of the Gaussian 
distribution is used: the estimation function of the mean 
value of a Gaussian distribution with Nx samples is again 
Gaussian distributed, but with Nx times smaller variance: Fig. 



Fig. 4. LLR distribution for correct subkey hypothesis of 1 bit length 
and positive bit value, for different SNR and different number of traces. 
Distribution for a wrong previous round key estimate also shown. 



|2] shows not only the conditional received values, but also the 
estimation functions of the conditional mean values. If it is: 



P(f\vi = +l)ocM(e,a n ) 



(16) 



then the estimation function of the mean fj, p for positive Vi is: 



P(fi p ) cx Af(e, 



(17) 



The likelihood of the mean value fi p for positive vt being 
larger than the mean value \i n for negative v$ can be approx- 
imated as the Gaussian tail distribution: 



P(^p > Ma) ~ Q{— — ) 



(18) 



where fi p and fi z are the conditional mean values computed 



over the trace partitions, and 



is the correspond- 



jN T /2 

ingly computed standard deviation or the estimation functions 
(assuming equal partition sizes). The tail probability is illus- 
trated in Fig. [3] Computation of this value can be implemented 
as a table lookup of the Q-function. The likelihood of the 
subkey hypothesis to be correct is the likelihood of fi p being 
larger than \i z ; 

P(ks) = P(/i P > A**) d9) 

Thus, likelihoods can be generated for all possible subkey 
hypotheses ks. 

2) Compute key bit LLRs from subkey likelihoods: The 
number and length of the subkey hypotheses depend on 
selection function. For s bits length, there are 2 s hypotheses. 
Each of the i = 1 . . . s bit positions is contained with positive 
bit value and negative bit value in 2 s _1 hypotheses each. This 
entails that there is always at least one hypothesis with positive 
key bit value and one counterhypothesis with this bit being 
negative. The key bit LLRs can be computed as: 

£ kse5s+ P(ks) 



LQh) = log 



E 



(20) 




Fig. 5. M-algorithm: nodes are expanded into Mb children, of which the 
most likely Mr per level are further evaluated (example Me = 5, Mr = 3). 



where Si + denotes the set of all hypotheses where bit number 
i has positive value, and <S;_ the set where it has negative 
value. 

To reduce the computational effort, the Max-Log approxi- 
mation |[T9ll can be used: 



( log(flk) 



(21) 



which yields 



(log P(ks) ) ~ k max (log P(k s ) ) (22) 



L(bi) ~ max 

k s e<S i+ 

Fig. |4] shows conditional LLR distributions for positive bit 
value, for different SNR and different number of traces. It 
also shows an LLR distribution for a wrong previous round 
key. The improvement in expected LLR magnitude and classi- 
fication accuracy with increasing SNR and increasing number 
of traces is clearly visible. 

3) Combining independent selection functions: IfLLRsfor 
the same bit are generated from independent selection func- 
tions, these LLRs are simply added for information combining. 
Different subkey hypothesis lengths are no problem in this 
formulation. 

C. Sequential decoding: multi-round soft information combin- 
ing 

The proposed implementation uses the M-algorithm, which 
is illustrated in Fig. [5] It is a breadth-first tree search with 
deterministic limited complexity. Each node in a tree is reach- 
able from the root with exactly one path. Visited nodes are 
assigned the path metric, and in each tree level only the M- 
best nodes (best path metrics) are expanded into the next level. 
The algorithm finds the leaf with the 'approximately best' 



metric, where the approximation is better for larger M. The 
M-algorithm comprises the 'greedy' heuristic for the special 
case M=l. 

1 ) Path metric: In order to find the MLSE solution for the 
expanded key, the key bit sequence log-likelihood is chosen 
as path metric. This is a computationally convenient choice 
because the Max-Log approximation can also be applied to 
the mapping from LLR to probability (Eq. (fT5ll). e.g. ifTTl : 

log(P(& 2 - -1)) ' J 



lo 



1 



e L(bi) 

logl-log(e + e L(6i) ) 
- max(L(6 4 );0) 



(23) 



The Max-Log approximated log-likelihood of a sequence of 
bits, e.g. a round key, then becomes: 



log(P(b)) 



(24) 



Summands increase the sequence log-likelihood if both the L- 
value and the bit have equal signs; otherwise they reduce it. Bit 
positions contribute to the metric according to their reliability 
(LLR magnitude). An overview of the proposed algorithm in 
pseudo-code formulation is given in Alg. Q] 

2) Node expansion: generate child round key candidate 
set: This section describes the node expansion into its child 
nodes. The child round key candidate set is generated from the 
LLRs for the current round's bit positions (which are obtained 
according to Sec lIV-Bb . Since the round LLR generation needs 
the data (cyphertext or plaintext) corresponding to the traces, 
this can be directly applied only in the first round. For later 
round i, the data first needs to be decrypted according to the 
current candidate's previous round keys k,_i. j (conditional 
decryption, Alg.[T|i. The children's round metrics therefore 
follow from Eq. (l24l to: 



/3i?(ki|ki_l...l) = N "/.(/.-,. , k, 



(25) 



By design of the cryptographic algorithm, it is again unfeasible 
to visit all child nodes even for a single node expansion. The 
expansion into the M e 'best' child nodes therefore again uses 
a heuristic. 

One possible heuristic to find the 'Mg-best' children of a 
node is described in the following. It computes round metrics 
/3r for a search space of Ms > Me candidate childs which 
is determined as: 

• sort conditional round LLRs L(ki j |kj_i...j.) according to 
magnitude 

• the most likely child is determined as the LLR signs 

« the other Ms — 1 child candidates are determined as bit 
deviations from the most likely child: 

- enumeration subspace: for the tie positions with 
smallest LLR magnitude (most unreliable), all 2™ B 
bit combinations are enumerated 

- combinatorial subspace: for the Nb — tie remaining 
bit positions, up to nc bit deviations are considered 



- the search space is the Cartesian product of the enu- 
meration subspace and the combinatorial subspace 

> the M$ metrics are computed 

• the Me childs with best metric are returned. 

A more detailed description of this node expansion heuristic 
is listed as pseudo-code (similar to Octave/Matlab notation) in 
Alg.0 

3) Reduce candidate set with path metric: The number of 
candidates on one tree level is then reduced to Mr < Me 
using the path metric (3p\ 

Mki..i)= fa(fci|kj-i...i) + Mki-i..i) (26) 

round log-likelihood a-priori log-probability 

The comment in the underbraces notes that the extension of 
the path metric by one level is Bayesian updating (Eq. (fT3l >) 
in the log domain when adding the new round's information, 
i.e. the a-posteriori log-probability is computed. 

The reduction of the candidate set to Mr candidates is 
possible because the expected round LLRs are zero for any 
wrong round key in the path candidate (compare Fig. @). The 
expected best child's round metric is: 



E 



max/3 R (kj|kj_i,„i) 



iV B *E|jL(fc i , 7 -|k i _ 1 ... 1 )| 

> if kj_x...i correct, 
else 

(27) 

which gives a significant difference in the path metric values 
for candidates which contain a false round key. 



input : T traces and data 

output: 'most likely' expanded key candidate 

// special treatment for round 1: 
foreach Byte per round do 
| LLRs <- getLLRs (traces,data) ; 
end 

// get Me best round key candidates: 
candidate [] <- getBestchiids (LLRs, M E ) ; 

// other rounds: 
for round r «— 2 to Nr do 
foreach candidate do 

rounddata <- roundDecrypt (data, 
candidate); 

foreach Byte per round do 
[ LLRs = getLLRs (traces, rounddata) ; 
end 

end 

/* compute path metrics, reduce to 

Mr candidates; */ 
candidate [] <- reduceCandidates (candidate 
[], Mr) ; 

foreach candidate do 
| candidate [] <- getBestchiids (LLRs, M E ) 
end 
end 

return reduceCandidates (candidate [], 1) ; 



Algorithm 1: Pseudo-code overview of implementation 
using M-algorithm, assuming byte-wise processing. 



V. Complexity 

The complexity measure of interest for application would be 
the time complexity (number of clock cycles) on the hardware 
available to run the algorithm. But this cycle count depends 
on the instruction set architecture of the processor cores 
used, and special hardware acceleration like with FPGAs is 
possible. Therefore instead of assuming any special instruction 
set, a high level complexity evaluation in dependence on the 
parameters Me, Mr, tie, nc, and R is given (assuming the 
node expansion heuristic from Alg. |2). 

a) Number of visited nodes:: one child expansion visits 
Ms candidates, with: 



Ms 



_ r,n E 



E 

4=0 

E 



N B - n E 



{N B - n E )\ 
(N B -n E -iy. ■ i\ 



(28) 



Me candidates are returned for each expansion, and the tree 
is pruned for each level to Mr round survivors. The number 
of visited nodes is therefore: 



^visits = M s + Mr ■ M s ■ (R - 1) 



(29) 



b) Metric computations:: in a straight-forward imple- 
mentation, one complete metric would be computed per visited 
node. But since the metric is additive over path segments (Eq. 
d2"6li). the reuse of intermediate results is possible. And since 
childrens' round metrics are computed as sums of the same 
LLRs with different signs (Eq. d25])), further reuse is possible 
with a Gray code binary enumeration ||2T1 . 

c) Compare/Select operations for candidate selection:: 
Child expansion needs to find the Me best values out of Ms- 
Round candidate reduction needs to find Mr best out of Me 
after the first round, and out of Me ■ Mr later. Different 
search algorithms like bubblesort or comb sort are possible, 
but other algorithms like insertion sort may be preferable 
for parallelization speedup on modern processors with SIMD 
instructions. Again, a reuse may be possible with adequate 
data structures. 

d) Number of round decryptions:: after each round, Mr 
decryptions are needed, in sum: 



decrypt 



= Mr >(R-1) 



(30) 



VI. Accuracy 



A simulative evaluation of the estimation accuracy is shown 
in Fig. [6] The scenario assumes R = 1Q rounds with 
Nb — 128 bit round key length, and a selection function 



function Mbest «- getBestChildsApprox (LLRs, 
Me, searchBitsEnum, numCorrectErrors) ; 

input : round LLR vector conditioned on previous round 

key path hypothesis 
output: Me 'most likely' round key candidates 

numL <- length (LLRs) ; 

/* sort according to reliability: */ 

[Lsorted, Lperm ] <- sort (LLRs, 'descend') ; 

bestCandidatePerm <- sign (LLRs) ; 

sizeEnum <- 2 searohBi,sEnum ; 

searchEnum <- de2bi(0:sizeEnum-l, 'left-msb') ; 

/* bit values ±1: */ 

zldxs <- find (searchEnum == 0) ; 

searchEnum (zldxs) < 1 ; 

candCombldxs «- cell(numCorrectErrors,l) ; 
nCombSep (0) <- 1 ; 
nComb <- nCombSep (0) ; 
for /=7. numCorrectErrors do 

candCombldxs(i) <- nchoosek (1:128 - 
searchBitsEnum,i) ; 
nCombSep (i) <- nchoosek (128 - 
searchBitsEnum,i) ; 
nComb <- nComb + nCombSep (i) ; 
end 

numCand <- nComb * 2 searohBitsEnum ; 
candCombBinary <- ones (numCand, numL) ; 
pos <- 1 ; 

for i <- i.numCorrectErrors do 

for /' <- 7 ; nCombSep (i-1) * sizeEnum do 
pos <- pos + 1 ; 

/* deviations from most likely 

round candidate: */ 
candCombBinary (pos, candCombldxs(/)(mod 
(j,nCombSep (i))+l,:) ) <- -1 ; 
candCombBinary (pos, numL- 
searchBitsEnum + 1 : end) <- searchEnum 
(mod (j,sizeEnum)+l,:) ; 

end 
end 

rCandPerm <- bestCandidatePerm.* 
candCombBinary; 

/* compute metrics, get Mb best: */ 
metrics <- £X(rCandPerm.* kron (ones 
(numCand, l),LSorted)).') ; 

[candMetr, candldx ] «- sort (metrics,'descend') ; 
bestCandldxsPerm <- candldx (1:M B ) ; 
bestRCandBinPerm <- rCandPerm 
(bestCandldxsPerm,:) ; 

/* unpermutate initial sorting: */ 
[dummy, unpermldxs ] <- sort (Lperm, 'ascend') ; 
return Mbest <- bestRCandBinPerm (:, unpermldxs) ; 



Algorithm 2: Pseudo-code (similar to Octave / Matlab 
notation 11201 ) to approximately find best Me child candi- 
dates. 
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Fig. 6. The proposed algorithm reduces the number of required traces 
compared to the standard algorithm by a factor of two in a configuration 
with reasonable complexity. 



with subkey length of 1 bit. The proposed algorithm is run 
with Me = 10000 and Mr = 20 and compared for different 
SNR with the standard single-round 'hard decision' estimation 
(which directly quantizes to the ML subkey hypotheses). 
Simulation computes the expected number of required traces to 
successfully estimate the secret key. The simulation results are 
obtained with a scripting language on a single PC, which may 
be seen as indication of a reasonable computation complexity. 
Over the complete investigated SNR range, the proposed 
algorithm reduces the required number of traces roughly by a 
factor of two (corresponding to a 3dB SNR gain). Accuracy 
can be further improved by increasing Me and Mr. 

VII. Discussion 

This paper describes approximative Bayesian inference for 
side channel secret key estimation in the sense of MLSE. 
The probabilistic formulation allows for optimal combination 
of information, like from different key substring hypotheses, 
different selection functions, and different processing rounds. 
It allows to exploit redundancy in the expanded key of multi- 
round cryptographic algorithms for soft-decision estimation 
error correction. Computational effort is reduced by using 
conditional independencies of variables, and scalable with 
sequential decoding methods. 

The example implementation using the M-algorithm in- 
cludes standard differential analysis as one special case (for 
Me = Mr = 1 if subkey hypothesis length is one bit, 
and using only one round) and has strictly better estimation 
accuracy otherwise. For longer subkey hypothesis length the 
proposed algorithm is more accurate due to probabilistic 
combing of hypotheses likelihoods. It further performs better 
if multiple rounds are considered, because intermediate errors 
of early quantization are avoided and can be corrected over 
the path metric. This higher accuracy comes at the price of 
increased computational complexity. 

Several modifications and extensions of the presented meth- 
ods are possible. They could be combined with different side 



channel analysis algorithms other than single-bit differential 
analysis. Other sequential decoding implementations like the 
stack algorithm or T-algorithm are applicable as well. It is fur- 
ther possible to terminate early, i.e. not to follow all processing 
rounds (some proposed Turbo Decoder implementations for 
example use a stopping criterion based on LLR magnitudes). 
The presented example implementation also did not yet en- 
force structural constraints like the key expansion algorithm. 
Pruning the code tree can be improved by considering only 
valid expanded key sequences. 

With sequential decoding, the side channel analysis prob- 
lem becomes a 3 -dimensional trade-off between measurement 
SNR, number of required traces and computational complexity. 
It becomes possible for example to use cheaper measurement 
equipment (worse SNR) by computing a bit more, or to 
a achieve a new minimum number of required traces by 
investing high computational effort. Future work may consist 
of further characterizing this trade-off by comparing algorithm 
modifications and identifying Pareto-efficient ones. 
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